Eecient Algorithms for Decision Tree Cross-validation
نویسندگان
چکیده
Cross-validation is a useful and generally applicable technique often employed in machine learning, including decision tree induction. An important disadvantage of straightforward implementation of the technique is its computational overhead. In this paper we show that, for decision trees, the computational overhead of cross-validation can be reduced signiicantly by integrating the cross-validation with the normal decision tree induction process. We discuss how existing decision tree algorithms can be adapted to this aim, and provide an analysis of the speedups these adaptations may yield. The analysis is supported by experimental results.
منابع مشابه
Eecient Algorithms for Decision Tree Cross-validation (extended Abstract)
Extended abstract Cross-validation is a generally applicable and very useful technique for many tasks often encountered in machine learning, such as accuracy estimation, feature selection or parameter tuning. A common property of these tasks is that one wants to validate a learned theory on a set of examples not used for its construction (i.e., an \independent test set"). When insuucient data a...
متن کاملEfficient algorithms for decision tree cross-validation
Cross-validation is a useful and generally applicable technique often employed in machine learning, including decision tree induction. An important disadvantage of straightforward implementation of the technique is its computational overhead. In this paper we show that, for decision trees, the computational overhead of cross-validation can be reduced significantly by integrating the crossvalida...
متن کاملCross-Validated C4.5: Using Error Estimation for Automatic Parameter Selection
Machine learning algorithms for supervised learning are in wide use. An important issue in the use of these algorithms is how to set the parameters of the algorithm. While the default parameter values may be appropriate for a wide variety of tasks, they are not necessarily optimal for a given task. In this paper, we investigate the use of cross-validation to select parameters for the C4.5 decis...
متن کاملEvaluation of Best First Decision Tree on Categorical Soil Survey Data for Land Capability Classification
Land capability classification (LCC) of a soil map unit is sought for sustainable use, management and conservation practices. High speed, high precision and simple generating of rules by machine learning algorithms can be utilized to construct pre-defined rules for LCC of soil map units in developing decision support systems for land use planning of an area. Decision tree (DT) is one of the mos...
متن کاملA Comparison of Accuracy between Decision Tree and k-NN Algorithm
Data mining has many functionalities. One of the main functions of data mining is the classification that is used to predict the class and generate information based on historical data. In the classification, there is a lot of algorithms that can be used to process the input into the desired output, thus it is very important to observe the performance of each algorithm. The purpose of this rese...
متن کامل